An efficient high-probability algorithm for Linear Bandits

نویسندگان

Gábor Braun

Sebastian Pokutta

چکیده

For the linear bandit problem, we extend the analysis of algorithm CombEXP from Combes et al. [2015] to the high-probability case against adaptive adversaries, allowing actions to come from an arbitrary polytope. We prove a high-probability regret of O(T2/3) for time horizon T. While this bound is weaker than the optimal O( √ T) bound achieved by GeometricHedge in Bartlett et al. [2008], CombEXP is computationally efficient, requiring only an efficient linear optimization oracle over the convex hull of the actions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CBRAP: Contextual Bandits with RAndom Projection

Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful alternative for solving practical problems of sequential decisions, e.g., online advertisements. In the era of big data, contextual data usually tend to be high-dimensional, which leads to new challenges for traditional linear bandits mostly designed for the setting of low-dimensional contextual d...

متن کامل

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...

متن کامل

Parametric Bandits: The Generalized Linear Case

We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive finite time, high probability bounds on the regret of the algorithm, extending previous analyses developed for the linear bandits to the non-linear case. The analysis highlights a key difficulty in genera...

متن کامل

G : Bandits , Experts and Games 09 / 12 / 16 Lecture 4 : Lower Bounds ( ending ) ; Thompson Sampling

Here is a parameter to be adjusted in the analysis. Recall that K is the number of arms. We considered a “bandits with predictions” problem, and proved that it is impossible to make an accurate prediction with high probability if the time horizon is too small, regardless of what bandit algorithm we use to explore and make the prediction. In fact, we proved it for at least a third of problem ins...

متن کامل

An Efficient Method for Selecting a Reliable Path under Uncertainty Conditions

In a network that has the potential to block some paths, choosing a reliable path, so that its survival probability is high, is an important and practical issue. The importance of this issue is very considerable in critical situations such as natural disasters, floods and earthquakes. In the case of the reliable path, survival or blocking of each arc on a network in critical situations is an un...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1610.02072 شماره

صفحات -

تاریخ انتشار 2016

An efficient high-probability algorithm for Linear Bandits

نویسندگان

چکیده

منابع مشابه

CBRAP: Contextual Bandits with RAndom Projection

Thompson Sampling for Contextual Bandits with Linear Payoffs

Parametric Bandits: The Generalized Linear Case

G : Bandits , Experts and Games 09 / 12 / 16 Lecture 4 : Lower Bounds ( ending ) ; Thompson Sampling

An Efficient Method for Selecting a Reliable Path under Uncertainty Conditions

عنوان ژورنال:

اشتراک گذاری